Structure extended multinomial naive Bayes

نویسندگان

  • Liangxiao Jiang
  • Shasha Wang
  • Chaoqun Li
  • Lungan Zhang
چکیده

Multinomial naive Bayes (MNB) assumes that all attributes (i.e., features) are independent of each other given the context of the class, and it ignores all dependencies among attributes. However, in many real-world applications, the attribute independence assumption required by MNB is often violated and thus harms its performance. To weaken this assumption, one of the most direct ways is to extend its structure to represent explicitly attribute dependencies by adding arcs between attributes. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network from high-dimensional text data is almost impossible. The main reason is that learning the optimal structure of a Bayesian network from high-dimensional text data is extremely time and space consuming. Thus, it would be desirable if a multinomial Bayesian network model can avoid structure learning and be able to represent attribute dependencies to some extent. In this paper, we propose a novel model called structure extended multinomial naive Bayes (SEMNB). SEMNB alleviates the attribute independence assumption by averaging all of the weighted one-dependence multinomial estimators. To learn SEMNB, we propose a simple but effective learning algorithm without structure searching. The experimental results on a large suite of benchmark text datasets show that SEMNB significantly outperforms MNB and is evenmarkedly better than other three state-of-the-art improved algorithms including TDM, DWMNB, and Rw,cMNB. © 2015 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Fine-Grained Weighting Method in Multi-Label Text Classification

Multi-label classification is one of the important research areas in data mining. In this paper, a new multilabel classification method using multinomial naive Bayes is proposed. We use a new fine-grained weighting method for calculating the weights of feature values in multinomial naive Bayes. Our experiments show that the value weighting method could improve the performance of multinomial nai...

متن کامل

Two-Stage Text Classification Using Bayesian Networks

The“curse of dimensionality”provides a powerful impetus to explore alternative data structures and representations for text processing. This paper presents a method for preparing a dataset for classification by determining the utility of a very small number of related dimensions via a Discriminative Multinomial Naive Bayes process, then using these utility measurements to weight these dimension...

متن کامل

Textual sentiment summarization

Naive Bayes The multinomial Naive Bayes model on a dictionary is a familiar option for text classification, e.g. (Gale, Church, & Yarowski 1992), (McCallum & Nigam 1998). When there are additional features, the Naive Bayes model has also a natural extension: We simply assume that each additional feature is independent of all the others, conditional upon . In this case, we invert Bayes’ Law by o...

متن کامل

CMPS242 Final Project - A Comparison of Naive Bayes and Boosting

My final project was to implement and compare a number of Naive Bayes and boosting algorithms. For this task I chose to implement two Naive Bayes algorithms that are able to make use of binary attributes, the multivariate Naive Bayes and the multinomial Naive Bayes with binary attributes. For the boosting side of the algorithms I chose to implement AdaBoost, and its close bother AdaBoost*. Both...

متن کامل

Properties of Bayes Factors Based on Test Statistics

This article examines the consistency, interpretation and application of Bayes factors constructed from standard test statistics. Primary conclusions are that Bayes factors based on multinomial and normal test statistics are consistent for suitable choices of the hyperparameters used to specify alternative hypotheses, and that such constructions can be extended to obtain consistent Bayes factor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 329  شماره 

صفحات  -

تاریخ انتشار 2016